GRADIENT FLOWS OF THE ENTROPY FOR JUMP 

PROCESSES 
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Abstract. We introduce a new transportation distance between prob- 
ability measures on R'' that is built from a Levy jump kernel. It is de- 
fined via a non-local variant of the Benamou-Brenier formula. We study 
geometric and topological properties of this distance, in particular we 
prove existence of geodesies. For translation invariant jump kernels we 
identify the semigroup generated by the associated non-local operator 
as the gradient flow of the relative entropy w.r.t. the new distance and 
show that the entropy is convex along geodesies. 
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1. Introduction 

In the last two decades the theory of optimal transportation has found 
applications to many areas of mathematics such as partial differential equa- 
tions, geometry and probability. We refer the reader to the monograph [27] 
for an overview. In particular, optimal transport has proved very useful 
in the study of diffusion processes. One of the most striking examples is 
Otto's discovery [18, 24] that many diffusion equations can be interpreted 
as gradient flows of a suitable free energy functional with respect to the 
L^-Wasserstein distance on the space of probability measures. A prominent 
example is the heat equation which is the gradient flow of the Shannon 
entropy. By now, similar interpretations of the heat flow have been estab- 
lished in a variety of settings ranging from Riemannian manifolds to abstract 
metric measure spaces, see [13, 23, 15, 17, 2]. 
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The aim of this article is to build a bridge between the theory of jump 
processes and non-local operators on one hand and ideas from optimal trans- 
portation on the other hand. We will give a gradient flow interpretation of 
the equation 

dtu = Cu , (1.1) 
where £ is a non-local operator given by 

Cu{x) = j u{y) - u{x) - {y - x) ■Vu{x)l{iy_^i^iyJ{x,dy) , 

with a Levy measure J{x, dy) for every x G M*^. Such operators arise as the 
generators of a pure jump Feller process. For this purpose the Wasserstein 
distance is not appropriate. The main contribution of this article is thus the 
construction of a new transportation distance on the space of probability 
measures that is non-local in nature and allows to interpret equation (1.1) 
formally as the gradient flow of the relative entropy. We define this distance 
via a non-local variant of the dynamical characterization of the Wasserstein 
distance by Benamou and Brenier [7]. A prominent example we will often 
consider is given by the choice Ja{x, dy) = Ca\y — x\~°'~'^ dy with a G (0, 2) 
corresponding to the fractional Laplacian C = —(—A) 2" which is a pseudo 
differential operator with symbol |^|°. For translation invariant jump ker- 
nels such as Jq where the underlying jump process is a Levy process, we 
rigorously identify the equation as the gradient flow of the entropy w.r.t. the 
new distance in the framework of gradient flows in metric spaces developed 
in [1]. Moreover, we show that the entropy is convex along geodesies. 

To motivate our interest in such a link between jump processes and opti- 
mal transport, let us highlight two observations. 

The gradient flow approach has been used as a powerful tool in the study 
of many evolution partial differential equations. Already in Otto's original 
work [24] convexity properties of the entropy functional have been used to 
derive explicit rates of convergence to equilibrium for the porous medium 
equation. This approach is also well adapted to the study of functional 
inequalities, such as logarithmic Sobolev inequalities (see e.g. the famous 
result by Otto-Villani [25]). Recently, it has been shown that the gradient 
flow characterization provides a good framework to study stability prop- 
erties of diffusion processes under changes of the driving potential or the 
underlying geometry [3], [16]. 

The regularity theory for elliptic and parabolic equations involving non- 
local operators is under active development including both analytic and 
probabilistic approaches (see e.g. [9], [6] and references therein). In a local 
setting very precise regularity results can be obtained using a lower bound 
on the Ricci curvature of the operator in the sense of the Bakry-Emery 
criterion [5]. Equivalently, such curvature information can be encoded into 
convexity properties of the entropy along Wasserstein geodesies. In fact, 
geodesic convexity of the entropy has been used as a synthetic notion of a 
lower Ricci curvature bound for metric measure spaces by Lott-Villani [19] 
and Sturm [26]. In this sense the approach presented here could be used to 
define an alternative notion of curvature in the spirit of Lott-Villani-Sturm 
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that might be more adapted to certain situations than the non-local F^- 
calculus. In the discrete setting of finite Markov chains, this approach has 
already been used in [14] to derive new functional inequalities. 

Modifications of the Wasserstein distance have been considered recently 
by a number of authors. In [12] Dolbeault, Nazaret and Savare proposed a 
new class of transport distances based on an adaptation of the Benamou- 
Brenier formula to give a gradient flow interpretation to a class of transport 
equations with non-linear mobilities. Very recently, Maas [20] (see also [22], 
[10] for independent related work by Mielke and Chow et al.) introduced 
a distance between probability measures on a discrete space equipped with 
a Markov kernel such that the law of the continuous time Markov chain 
evolves as the gradient flow of the entropy. Our approach is very similar in 
spirit to the work of Maas and generalizes it to a certain extend. On the 
technical side we use an adaptation of the techniques developed in [12] to 
our non-local setting. 

Main results. Let us now discuss the content of this article in more detail. 
Let {J{x,-),x G W^) be a jump kernel. By this we mean that for all x € 
M"^ J{x, ■) is a Radon measure on R"^ \ {x} depending measurably on x. 
Throughout this text J shall satisfy the following 

Assumption 1.1. For every bounded continuous function / : M'^ — )• M the 
mapping 



^ ^ J -^1^-2/1 )Jix,dy) 

is again bounded and continuous. 

In particular {J(x,-),x G M"^) is a so called Levy kernel (see e.g. [4, Ch. 
3.5]). Further let m be a Radon measure on M*^. We assume that J is 
reversible w.r.t. m, i.e. the measure J (x , dy)m{dx) is symmetric. 

We denote by ^{W^) the space of Borel probability measures on M'^. 
Given /i G ^{M.'^) we deflne its relative entropy w.r.t. m by 



'^(/-f) = J plogpdm 



if p is absolutely continuous w.r.t. m with density p and (plogp)^ is inte- 
grable. Otherwise we set 'H{p) = +oo. 

A non-local transportation distance. Let us flrst motivate the construction 
of our new metric by recalling the dynamical characterization of the L^- 
Wasserstein distance. The Benamou-Brenier formula [7] asserts that for two 
probability densities po, pi on we have 

W^{po,pi) = inf /' / Pt{x)dxdt , (1.2) 
p'i> Jo J 

where the inflmum is taken over all sufficiently smooth functions p : [0, 1] x 
M*^ — ^ R_i„ and : [0, 1] x M*^ — ^ M subject to the continuity equation 

idtp + V-{pVi;) = 0, ^^^^ 
\po = Po , Pi = Pi ■ 
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Here we will define a (pseudo-)metric (i.e. possibly attaining the value 
+00) on =^^(R°') by giving a non-local analogue of formulas (1.2) and (1.3). 
In order to obtain a metric with the desired properties it is necessary to 
introduce a function 6 : ]R_|_ x M_|_ — )■ satisfying Assumption 2.1 below and 
to consider the mean p{x,y) := 9{p{x), p{y)) of a given density p : M'^ — )• R 
at different points. We will be mostly interested in the logarithmic mean 

0(s,t) = -^^ (1.4) 

^ ' ^ logs -log t ^ ^ 

but for future use we allow for more generality in the construction. For a 
function ^p : R'^ — t- R we will denote by V'i/j{x,y) = ipiu) ~ il^{x) its discrete 
gradient. Following the approach of [20] one is led to consider the following 
'distance'. Given probability measures po = pom and pi = pim set 

W{po,pif := inf^ / / \Vipt{x,y)f ptix,y)J{x,dy)m{dx)dt , (1.5) 
p.'/' 2 Jo J 

where the infimum is now taken over all functions p and satisfying the 
'continuity equation' 

idtpt + v-{ptVi^t) = , ^^g^ 

\po = Po , Pl = Pi , 



in the sense that for every test function ip G C^(R ) we have 

ipdtptix)m{dx) - ^ j Vip{x,y)Vi;{x,y)p{x,y)J{x,dy)m{dx) = 0. 

Instead of addressing the variational problem (1.5) directly we will adopt a 
measure theoretic point of view and recast it in the more natural relaxed 
setting of time-dependent families of Radon measures. Let us briefly sketch 
this approach. 

WeletG = {(x,y) G R'^xR'^ : 2; / y} and fix 7(d2;, dy) = J(x, d?/)m(dx). 
We replace p hy a continuous curve t ^ pt = Pt^n^ in ^(R*^) and ipt induces 
a family of signed Radon measures Ut{dx,dy) = Vipt{x,y)pt{x,y)'y{dx,dy) 
on G. The couple {p, u) now satisfies the linear equation 

idm + ^-t^t = , ^^^^ 

\po = Po, Pi = Pi 

which we understand in the sense of distributions, i.e. for all test functions 
if £ C~((0,1) X R'^) : 

j dtipdptdt+]^ j j VLp{x,y)vt{dx,dy)dt = 0. 
The quantity to be minimized in (1.5) can now be rewritten as 



2 / J 1 \ -1 



^(x),^(y)) 7(dx,dy)dt 



We will define a distance W by proceeding as follows. To any p E ^(R ) we 
associate two Radon measures on G by setting p^{dx,dy) = J{x,dy)p{dx) 
and p'^{dx, dy) = J{y, dx)p{dy). Given a Radon measure u on G we choose 
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a reference measure a on G such that v = wa and /x* = pV, i = 1,2 are ah 
absolutely continuous w.r.t. a. Then we define the action functional by 

2 / J..1 j..2\ -1 



di/ 



da 



da ' do- 



Assumptions on 6 will guarantee that the map {w,s, IS 
homogeneous, hence the definition of A is independent of the choice of a. 
Given two measures Jiq, fli S ^(M"') we denote by C<?o,i(/^o> Ui) the set of all 
sufficiently regular solutions (to be made precise in section 3) (/^t, i^i)tg[o,i] 
of the continuity equation (1.7). 

Definition. For JiQ,Jl\ S ^{W^) we define 

It is unclear whether W coincides with W defined in (1.5) in full gener- 
ality. However, we will give a positive answer for the more restricted case 
of a sufficiently regular translation invariant jump kernel such as Jq, (see 
Proposition 5.8). We can now state the first main result of this article. 

Theorem 1.2. W defines a (pseudo-) metric on ^(M*^) . The topology it 
induces is stronger than the topology of weak convergence. For each r G 
^(M^) the set ■= {fJ- G =^(M'^) : W(/x,r) < oo} equipped with the 
distance W is a complete geodesic space. 

Gradient flow of the entropy. Let us give a short formal argument why 
equation (1.1) can be seen as the gradient flow of the relative entropy w.r.t. 
the distance W if we choose 9 to be the logarithmic mean. 

In the classical setting many partial differential equations of the form 

dtp-V- (pV/'(p)) = 

can, at least formally, be seen as the gradient flow of the integral functional 
J-{p) = J f{p)dm w.r.t. the L^-Wasserstein distance. Hence in the new 

geometry determined by the distance W via (1.5), (1.6) the gradient flow of 
the functional should be given by the equation 

dtp-V- {pVf'ip)) = . 

If we now consider the relative entropy T-L we have /'(r) = 1 + logr. Taking 
into account (1.4) we see that the corresponding gradient flow is given by 

dtp-v- (Vp) = , 

which is a weak formulation of (1.1). In particular we see that the appear- 
ance of the logarithmic mean is necessary in order to account for the fact 
that the discrete gradient lacks a chain rule. 

In the more restricted setting of a translation invariant jump kernel we can 
indeed rigorously identify equation (1.1) as the gradient flow of the relative 
entropy w.r.t. the corresponding metric W in the framework of the metric 
theory developed in [1]. So assume for the rest of this introduction that J 
satisfies 

J{x + z,A + z) = J{x,A) 'ix,z gR"^, A cR'^\{x} 
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and let m be Lebesgue measure. Then we can write J{x, A) = v[A — x) 
for a Levy measure v on \ {0}. The operator C generates a semigroup 
Pt = exp(i£) in L^(]R'^) that can be represented by kernel pt- 

Ptf{x) = j f{y)pt{x,dy) . 

In fact Pt is the transition kernel of the Levy process with characteristic 
triplet (0,0,1^) in the sense of the Levy-Khinchine formula (see e.g. [4]). In 
the same way C generates a semigroup on ^(M"^). Under certain further 
regularity assumptions on the transition kernel (see Section 5 for a precise 
statement) we prove the following 

Theorem 1.3. The semigroup P generated by C is the gradient flow of 
the relative entropy in the sense that it satisfies the Evolution Variational 
Inequality (EVI): For any G ^(M.'^) and a G we have 

Y^^^iPtli.'')+'H{Pt4i) < nia) Vt>0. (1.8) 

Moreover the entropy is convex along W -geodesies. More precisely, let i^iq, fj,i £ 
J^{M.'^) such that W(/-io,Aii) < oo and let {fJ't)te[o,i] ^ geodesic connecting 
Ho and ^i- Then we have 

■Hint) < {I - t)nifio) + tnifii) . 

Among several ways to characterize gradient flows in metric spaces, the 
EVI is one of the strongest. For example it implies geodesic convexity of 
the entropy (see [11])- Convexity of the entropy along W-geodesics can be 
seen as a non-local analogue of McCann's displacement convexity [21], which 
corresponds to convexity along geodesies of the L^-Wasser stein distance. For 
the choice i^{dy) = Ca \y\~°'~'^ dy with a G (0,2) and a suitable constant Ca 
we obtain the following 

Corollary 1.4. The semigroup generated by the fractional Laplacian —(—A) 2" 
is the gradient flow of the relative entropy w.r.t. the metric W built from 
the jump kernel Ja{x, dy) = Ca\y — x\~'^~'^ dy. 

We expect that a similar result should also hold for semigroups associ- 
ated to suitable non-homogeneous jump kernels J. It would be desirable to 
find examples of kernels where the entropy is strictly geodesically convex. 
This could be exploited to derive new functional inequalities and rates of 
convergence to equilibrium for the corresponding evolution equation, as has 
been done in the discrete setting of finite Markov chains in [14]. However, 
establishing a stronger EVI(k) in concrete examples does not seem to be an 
easy task and we will address this question in a forthcoming publication. 
Moreover, we expect that the approach presented here can be generalized in 
order to give a gradient flow interpretation to evolution equations associated 
to Levy-type operators with both non-local and diffusion part. 

Organization of the paper. In Section 2 we study the action functional 
A and establish various properties needed in the sequel. Section 3 is devoted 
to an analysis of the non-local continuity equation (1.7). In Section 4 we 
define the metric W and prove Theorem 1.2. Finally, we focus on translation 
invariant jump kernels and present the proof of Theorem 1.3 in Section 5. 
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2. The action functional 

In this section we introduce and study an action functional on pairs of 
measures. Let us first introduce some notation. We denote by ^{W^) the 
space of Borel probability measures on M'^ equipped with the topology of 
weak convergence. We let G = {(x, y) G M*^ x W^\x ^ y] and denote by 
J^ioc^G) the space of signed Radon measures on the open set G equipped 
with the weak* topology in duality with continuous functions with compact 
support in G. 

The definition of the action functional and later the metric will depend 
on the choice of a function Q : R_|. x ]R+ — )• R_|_. We will always require it to 
fulfill the following assumptions: 

Assumption 2.1. The junction 9 has the following properties: 

(Al) (Regularity): is continuous on ]R_|. x M+ and G^ on (0, oo) x (0, oo); 

(A2) (Symmetry): e{s,t) = 0{t,s) for s,t > 0; 

(A3) (Positivity, normalisation): 6{s,t) > for s,t > and 9{1, 1) = 1; 

(A4) (Zero at the boundary): 9{0,t) = for all t > 0; 

(A5) (Monotonicity): 0(r, t) < e{s, t) for all < r < s and t > 0; 

(A6) (Positive homogeneity): 6{Xs,Xt) = \6{s,t) for A > and s,t> 0; 

(A7) (Concavity): the function 9 : R_|_ x R_|_ — )• R_|_ is concave. 

It is easy to check that these assumptions imply 

0{s,t) < Vs,t > . (2.1) 

In view of applications to gradient flows of the entropy we will be mostly 
interested in a particular choice of 6, namely the logarithmic mean given by 

9{s,t) = /"\"t^-"da = , , (2.2) 

^ ' ^ Jo logs-logt ^ ' 

the latter expression being valid for s,t > 0. However, for future use we will 
allow for more generality in the choice of 9. Given a function p : — >• R+ 
we will often write 

p{x,y) := 9{p{x),p{y)) . 

We can now define a function a : R x R_|_ x R_(_ — )• R_|_ U {oo}, called the 
action density function, by setting 

f Ois,t)^0, 
a{w,s,t):=< 0, 9{s,t) = and w = , 

[ +00 , 9{s, t) = and -u; / . 

The following observation will be useful. 

Lemma 2.2. The function a is lower semicontinuous, convex and positively 
homogeneous, i.e. 

a{Xw, As, At) = Xa{w, s,t) Viu G R , s, t > , A > . 
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Proof. This is easily checked using (A6),(A7) and the convexity of the func- 

2 

tion {x,y) i-> y on M x (0, oo). □ 

We wiU now define an action functional on pairs of measures (/i, u) where 
fi S ^(W^) and u E A4ioc{G). To fi we associate a two Radon measures in 
Mioc{G) by setting: 

fi^{dx,dy) := J(x, dy)^(dx) , n'^{dx,dy) := J{y,dx)fi{dy) . (2.3) 

We can always choose a measure a £ Mioc{G) such that ^* = pV, i = 1,2 
and 1/ = wa are all absolutely continuous with respect to a. For example 
take the sum of the total variations a := \ijL^\ + + jt'l. We can then 
define the action functional by 



AifiyV) := J a{w, , p'^)da . 



Note that this definition is independent of the choice of a since a is positively 
homogeneous. Hence we can also write the action functional as 

where A is the vector valued measure given by A = {v,p}',p?). 

In the case where the measure p. is absolutely continuous w.r.t. m the next 
lemma shows that the action takes a more intuitive form. For this we denote 
by Jm G AiiociG) the measure given by Jm(dx,dy) = J(x, dy)m(dx). 

Lemma 2.3. Let p G ^^iW^) be absolutely continuous w.r.t. m with density 
p. Further let v G 7W/oc(G') such that A{p,v) < oo. Then there exist a 
function w : G ^ M. such that v = wpJm and we have 

A{p,i^) = \j \wix,y)\^ pix,y)J{x,dy)m{dx) . (2.4) 

Proof. Choose A G MiociG) such that Jm = hX and v = wX are both 
absolutely continuous w.r.t. A. Note that /i* = p^Jm, i = 1,2 with 
p^{x,y) = p{x) and p^{x,y) = p{y). Further, we denote by J? the density of 
p^ w.r.t A. Now by definition, 

A{p,v) = j a{w,p^,p^)dX < oo . (2.5) 

Let A C G such that fj^9{p^,p'^)dJm = 0. From the homogeneity of 6 we 
conclude 

= [ e{p\p^)djm = [ e{f,f)dx, 

J A J A 

i.e. 6{p^,fP) = A-a.e. on A. Now the finiteness of the integral in (2.5) 
implies that w = A-a.e. on A. In other words I'iA) = and hence f is 
absolutely continuous w.r.t. the measure pJm. Formula (2.4) now follows 
immediately from the homogeneity of a. □ 
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Lemma 2.4 (Lower semicontinuity of the action). A is lower semicontin- 
uous w.r.t. weak convergence of measures. More precisely, assume that 
fin ^ fJ- weakly in ^{M.'^) and Un ^* v weakly* in A4ioc{G). Then 

A{fi,v) < lim ini A{fJ,n,i^n) ■ 

n 

Proof. Note that by Assumption 1.1 the weak convergence of /x„ to /i imphes 
the weak* convergence of /xj^ to in A4^(G) for i = 1,2. Now the claim fol- 
lows immediately from a general result on integral functionals, Proposition 
2.5. □ 

Proposition 2.5 ([8, Thm. 3.4.3]). Let il. be a locally compact Polish space 
and let f : Vt X — [0, +00] he a lower semicontinuous function such that 
f{uj,-) is convex and positively 1-homogeneous for every a; G ri. Then the 
functional 

FW = |/(u;,^H)|A|(da;) 

Q 

is sequentially weak* lower semicontinuous on the space of vector valued 
signed Radon measures Miod^,^"')- 

The next estimate will be crucial for establishing compactness of families 
of curves with bounded action in Section 3. 

Lemma 2.6. i) There exists a constant C > such that for all fi € 
^(M"') and u G Mioc{G) we have: 

J {lA\x-y\)\i^\{dx,dy) < C^/A(J^ . 
G 

ii) For each compact set K d G there exists a constant C{K) > such 
that for all p G ^(M'') and v G MiociG) we have: 



\v\{K) < C{K)^/A(i^. 

Proof. To prove i) let us define the measure A = |;U^| + + li^l and write 
/X* = p*A, u = wX. We can assume that A{ij,, i/) < 00 as otherwise there is 
nothing to prove. This implies that the set A = {{x,y) \ a{u!,p^,p'^) = 00} 
has zero measure with respect to A. We can now estimate: 

j (1 A |x - y| ) \u\ idx,dy) 
(1 A |x — y| ) \w\ dA 
J (1 A |x - y| ) y/2e{p\p^Wa{w, pi , p2)dA 

{lA\x-yf)2e{p\p^)dx\ I J a{w,p\p^)dX 



< 
G 




\G 



< C^/A{fi, u) . 
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The last inequality follows, since by the estimate (2.1) and Assumption 1.1 
we have : 

{l^\x-v\^)e{p\p')dX < j (lA|x-y|2)l(pi+p2)dA 



G 



(l A |x — )J{x, dy)fi{dx) 



G 



< sup / (1 A |x — J(x, dy) < oo . 

X J 

To prove ii) we note that by a similar argument 



\u\{K) < ^j2J{x,dy)fi{dx)j ^/Aii^ . 



□ 



Lemma 2.7 (Convexity of the action). Let fi^ G ^(M'^) and G Mioc{G) 
forj = 0,1. Forr £ [0,1] set fi'^ = Tfi^ + {l-T)fi° and v'^ = tv^ + {1-t)v°. 
Then we have : 

A{fi^,u^) < r^(/,i/i) + (l-r)^(^°,i.O) . 

Proof. Let us fix a reference measure A S AiiociG) such that /i-''*, for j = 
0, 1 and i = 1,2 are all absolutely continuous w.r.t. A and write /i-''* = p^'^A 
and = w^X. Note that /z^-* = p^'*A with p^'* = rp^'* + (1 - t)p°'* and 
i/'^ = w'^X with = Tzt;^ + (1 — t)w^. From the convexity of the action 
density function a we obtain : 



A{fi\u^) = J a{w\ p^'\ p^''^)dX 



< T j a{w\p^'\p^''^)dX + {l-T) j a{w^,p^'\p^'^)dX 
= r^(/,iyi) + (l-r)^(/,i.O) . 

□ 

We will now show that the action functional enjoys a monotonicity prop- 
erty under convolution if we assume that the jump kernel is translation 
invariant in the sense that 

J{x-z,A-z) = J{x,A) Vx, z G M*^, A G B(M'^) . (2.6) 

For the rest of this section we also assume that m is Lebesgue measure. 
We first need to fix a way of convoluting measure on and on G in a 
consistent manner. Let fc be a convolution kernel, i.e. A; : M"' — )• satisfying 
/ k{z)dz = 1. Given a measure // G ^(M'^), its convolution is defined as 
usual by 

(/X * k){A) := [ k{z)p{A - z)dz G i3(M'^) . 
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On the other hand given a measure v G M.[oc{G) we define * /c G M.[oc{G) 
by setting for ah Borel measurable sets B C G 

{iy*k){B) := j k{z)u{B - (^^)dz . (2.7) 

Note that this implies in particular that for every bounded function f : G ^ 
M. with compact support in G we have: 

j fix,y){u*k){(lx,dy) = j j k{z)f{x + z,y + z)v{dx, ddy)dz . 

We now have the following monotonicity property under convolution. 

Proposition 2.8. Assume that J satisfies (2.6) and let k he a convolution 
kernel. Then for every /i G i!P{W^), v G A^ioc(G) we have 

A{n*k,u*k) < A{fi,u) . (2.8) 

Proof. We can assume without restriction that A{fj,, v) is finite as otherwise 
there is nothing to proof. Let us introduce the maps Tz '■ x ^ x + z for 
z G M"^ and let us denote by fJ-z,t^z the push forward (t^)^./^ = /i(- — z), 
resp. {tz X r^)*!^ = ^'(- — (^)). Using the convexity of the action functional, 
Lemma 2.7, together with its lower semicontinuity, Lemma 2.4, we see that 

A{fJ,*k,v*k) < J A{iJ,z,i^z)kiz)dz . 

Thus the proof is complete if we show that A{nz,i^z) = Aip^u) for all 
z G M'^. To this end recall the definition (2.3). Using the the invariance 
property (2.6) it is immediate to check that /x!, = (r^ x Tz)^:fi^ for z = 1, 2. 
Now choose A G Mioc{G) with ^u* = p^X and u = wX. Then for all ^; G M"^ we 
have {pzY = {lJ-^)z = P^{- — {l))^z and Uz = w{- — {l))Xz. Hence we finally 
obtain 

AiPz,r^z) = |a(u;(--Q),pi(--Q),/92(--Q))dA, 

a{w, p^ , p'^)dX = A{p,iy) . 

□ 

3. A NON-LOCAL CONTINUITY EQUATION 

In this section we will consider the continuity equation 

dtpt + V • i^t = on (0, T) X R'^ . (3.1) 

Here {pt)te[o,T] and {i-'t)t£[Q,T] are Borel families of measures in ^{W^) and 
■M.ioc{G) respectively such that 

j j {l/\\x-y\)\iyt\{dx,dy)dt < oo . (3.2) 

We suppose that (3.1) holds in the sense of distributions. More precisely, 
we require that for aU v? G G^{{0,T) x M*^) : 

j j dtft{x)pt{dx)dt + ^ j j VLpt{x,y)ut{dx,dy)dt = 0. (3.3) 
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Recall that for a function denote by Vip{x, y) = ip{y) — ip{x) 

the discrete gradient. Note that (3.2) is a natural integrability assumption 
one should make to ensure that the second term in (3.3) is well-defined. The 
following is an adaptation of [1, Lemma 8.1.2]. 

Lemma 3.1. Let iP't)t€[o,T] o,nd {i^t)t£[o,T] be Borel families of measures in 
^{W^) and Mioc{G) satisfying (3.1) and (3.2). Then there exists a weakly 
continuous curve (/It)tg[o,T] such that Jtt = l^t for a.e. t £ [0,r]. Moreover, 
for every if G C^°°([0,T] x W^) and all < to < ti < T we have : 



I ftidfiti - I ^tod^to =11 dtifdfitdt + I- I I Vipdvtdt . (3.4) 

J J J to J ^ J to J 

Proof. Let us set 

V{t) := / {lA\x-y\)\ut\{dx,dy) . 



By assumption t ^ V{t) belongs to Li(0,r). Fix ^ G C^{W^). We claim 
that the map t ^ /it(^) = / (^dfit belongs to W^'^{0,T). Indeed, using test 
functions of the form (p{t,x) = rj{t)(,{x) with rj £ C^{0,T), equation (3.3) 
shows that the distributional derivative of ^t{0 is given by 

f^tiO = \ j ^idut 

for a.e. t G (0, T) and we can estimate 

\iitm <\j \^i\d\ut\ < \m\ciVit) . (3.5) 

Based on (3.5) we can argue as in [1, Lemma 8.1.2] to obtain existence of a 
weakly continuous representative t ^ Jit- 

To prove (3.4) fix 99 G C^{[0,T] x W^) and choose % G C^{to,ti) such 
that 

0<7?, <1, lim r?,(t) = l(t„,j,)(t) VtG[0,T] , lim n', = 5t, - St, . 
Now equation (3.3) implies 

- j Ve j ^djltdt = j rje j dtipdfxtdt + ^ j Ve j V(pdutdt . 

Thanks to the continuity of 1 1— )• /It we can pass to limit as e — t- and obtain 
(3.4). □ 

In view of the previous Lemma it makes sense to define solutions to the 
continuity equation in the following way. 

Definition 3.2. We denote by CSrifio, fii) the set of all pairs (/i,i>') satis- 
fying the following conditions: 

' {i) fJ, : [0, T] — )• I^{W^) is weakly continuous ; 
(a) /io = /io , IJ'T = fii] 

{Hi) {i't)t&[o,T] 0, Borel family of measures in MiodG) ; 
(^^) /o ! - y\)Wt\{dx,dy)dt < 00 ; 

{v) We have in the sense of distributions: 
dtfit + V • i^t = . 



GRADIENT FLOWS OF THE ENTROPY FOR JUMP PROCESSES 13 

The following result will allow us to extract subsequential limits from 
sequences of solutions to the continuity equation which have bounded action. 

Proposition 3.3 (Compactness of solutions to the continuity equation). 
Let (/i",!/") he a sequence in C£T{fio, fli) such that 

sup / A{fit,u^)dt < oo . (3.7) 

n Jo 

Then there exists a couple (/i, i^) G CSxifio, p-i) such that up to extraction 
of a subsequence 

fit fit weakly in ^(M'^) for all t G [0, T] , 

i^" ^* u weakly* in M{G x (0,T)) . 

Moreover along this subsequence we have : 

I ^(/it,i/t)dt < Yuaini I A{ijll,v1)<lt . 
Jo " Jo 

Proof. For each n define the measure := Jq u'^dt G AiiodG x (0,T)). 
From Lemma 2.6 and (3.7) we infer immediately that 

sup [ [ (1 A|x-y|) (d2;,dy)dt < oo . (3.8) 

n Jo J 

Moreover, for every compact set K C G we obtain 

sup|i/"| (iT X [0,r]) < sup [ \u^\{K)dt < oo . (3.9) 

n n Jo 

i.e. i/" has total variation uniformly bounded on every compact subset 
of G X [0,T]. Hence we can extract a subsequence (still indexed by n) 
such that i/" ^* u in A4ioc{G x [0, T]). By the disintegration theorem we 
have the representation u = Utdt for a Borel family (ut) still satisfying 
(3.2). Let us set D = {{x,x) : x G U.'^} and define the finite measures 
I?" G A^(M2rf X [0,r]) given by u"'{dx,dy) = (1 A \x - y\)v"{dx,dy)dt on 
G X [0,r] and i>"(L» x [0,T]) = 0. (3.8) implies that (up to extraction of 
another subsequence) u"" ^* V in M{^'^ X [0,T]) where V is defined similar 
to 

Let < to < ii < 7" and i G C;?°(M"'). We claim that 

/ SJidv'ldt / Vidvtdt . (3.10) 

J to J J Iq J 

Let us define /3 : M^'^ x [0, T] ^ M by setting 

\^to,t,)it)VCix,y){l A |x - y|)-i , x^y, 
, x = y . 



I3{x,y,t) 



Now (3.10) is equivalent to J f3di>^ — ^ J fidu. Note that (3 is bounded 
with compact support and that the discontinuity set of /3 is concentrated on 
'S?'^ X {to^ti] VJ D X [0, r] which is negligible for v. Hence the claim follows 
from general convergence results (see e.g. [1, Prop. 5.1.10]). 

Combining now the convergence (3.10) with (3.4) for (/9(t,x) = ^{x) and 
to = 0, ti = t we infer that converges weakly to some /if G 
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for every t G [0,T]. It is easily checked that the couple belongs to 

CSTifio, fii)- As in Lemma 2.4 the lower semicontinuity now follows from 
Proposition 2.5 by considering A{fit,i^t)dt as an integral functional on 
the space MiodG x [0, T]). □ 

4. A NON-LOCAL TRANSPORT DISTANCE 

We are now ready to give the definition of the distance W. We will 
then establish various properties, in particular existence of geodesies. More- 
over, we will characterize absolutely continuous curves in the metric space 

Definition 4.1. For /ioi/^i G ^(M*^) we define 

W{flo,flif := infj^ A{fit,i^t)dt : (/i, i^) G i(/io, w)| • (4.1) 

Let us first give an equivalent characterization of the infimum in (4.1). 
Lemma 4.2. For any T > and /ioj/^i S =^^(M^) we have : 

W(/io,w) = ^/Aint,iyt)dt : £ CSrifio, fii)^ ■ (4.2) 

Proof. This follows from a standard reparametrization argument. See [1, 
Lem. 1.1.4] or [12, Thm. 5.4] for details in similar situations. □ 

The next result shows that the infimum in the definition above is in fact 
a minimum. 



Proposition 4.3. Let p.o,fii G ^{M."-) be such that W := >V(/io,/^i) is 
finite. Then the infimum in (4.1) is attained by a curve {fJ^,i^) G C£i{p,o, fli) 
satisfying A{^J.t, t-'t) = for a.e. t G [0, 1]. 

Proof. Existence of a minimizing curve {fJ-,!^) G C£i{flQ, fii) follows imme- 
diately by the direct method taking into account Proposition 3.3. Invoking 
Lemma 4.2 and Jensen's inequality we see that this curve satisfies 

^ y^A{fit,iyt)dt > W = (^j^ A{piuut)dt^^ > ^A{iiu^t)dt . 

Hence we must have A{fit,t^t) = for a.e. t G [0, T]. □ 

We now prove the first main result Theorem 1.2 announced in the intro- 
duction which we recall here for convenience. 



Theorem 4.4. W defines a (pseudo-) metric on ^(M ). The topology 
it induces is stronger than the weak topology and bounded sets w.r.t. W 
are weakly compact. Moreover, the map (/io,/Ui) i— )■ W(/iO)/^i) is lower 
semicontinuous w.r.t. weak convergence. For each r G ^{W^) the set 
:= {// G =0^(M'^) : W(/U,r) < oo} equipped with the distance W is 
a complete geodesic space. 

Proof. Symmetry of W is obvious from the fact that a{w,-,-) = a{—w,-,-). 
Equation (3.4) from Lemma 3.1 shows that two curves in CSi can be concate- 
nated to obtain a curve in €£2- Hence the triangle inequality follows easily 
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< 



using Lemma 4.2. To see that >V(/io, /^i) > whenever JIq ^ fli assume that 
W(/io>/^i) = and choose a minimizing curve (/U, iv) G C£i{fio, fii). Then 
we must have A{fit, i^t) = and hence Ut = for a.e. t G (0, 1). From the 
continuity equation in the form (3.4) we infer /xq = A*i- 

Let us now show that the topology induced by W is stronger than the 
weak one. Let G ^^(M^) with W(/U„,/x) — )• and choose minimizing 

curves [jjP'jU^) G Cfi (//„,//). Fix a function : M*^ — ^ M bounded in C^. 
Using the continuity equation in the form (3.4) and Lemma 2.6 we estimate: 

< My j (lA|x-y|)|ivJ^|(dx,dy)dt 

This imphes /i^ ^ /i weakly. 

The compactness assertion and lower semicontinuity of W follow immedi- 
ately from Proposition 3.3. Let us now fix r G ^{W^) and let fio^fii S ^r- 
By the triangle inequality we have W{flo,fLi) < oo and hence Proposition 
4.3 yields existence of minimizing curve (/x,i/) G C£i{fio, fii). The curve 
t /Lit is then a constant speed geodesic in since it satisfies 

t 

Wifis,f^t) = j ^/A(ii~i^dr = (t-s)W(/io,/^i) < s < t < 1 . 

s 

To show completeness let (/i")n be a Cauchy sequence in In particular 
the sequence is bounded w.r.t. W and we can find a subsequence (still 
indexed by n) and G such that fi"^ ^* Invoking lower semicontinuity 
of W and the Cauchy condition we infer >V(//"',/i°°) — )• as n — )• oo and 

G ^r- n 

It is yet unclear when precisely the distance W is finite. However, we will 
see in the next section that the distance is finite e.g. along trajectories of 
the semigroup associated to a translation invariant jump kernel. 

The following result shows that under certain assumptions the distance 
W can be bounded from below by the L^-Wasserstein distance. Recall that 
this distance is defined for /iO)/Ui G ^(M.'^) by 

Wi{nQ,fii) := inf j \x - y\Tr{dx,dy) , 

where the infimum is taken over all probability measures vr G x 
whose first and second marginal are /^o and fii respectively (see e.g. [27 
Chap. 6]). 

Proposition 4.5. Assume that the jump kernel J satisfies 



sup / \x — y\'^ J{x,dy) < oo . (4-3) 

X J 
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Then for any /^Oi^i £ ^(M*^) we have the bound 

M 

Proof. We can assume that VV(^05/^i) < co. Take a minimizing curve 
(l-i,!^) G C£i{fiQ, i-ii) and let (^9 : R'^ — )• M be a 1-Lipschitz function. Us- 
ing the continuity equation in the for (3.4) and arguing similar as in Lemma 
2.6 we estimate 



< 



< 



< 



1 

2 

1 

2 

1 

V2 
M 

V2 



\x — y\ (dx, dy)dt 
\{f,lu^)dt 



\x-y\ J{x,dy)ntidx)dt 



Taking the supremum over all 1-Lipschitz functions tp yields the claim by 
Kantorovich- Rubinstein duality (see e.g. [27, 5.16]). □ 

We now give a characterization of absolutely continuous curves with re- 
spect to W and relate their length to their minimal action. Recall that a 
curve {lJ-t)t&[o,T] iii ^{^'^) is called absolutely continuous w.r.t. W if there 
exists m S L^{0,T) such that 



Wills, f^t) < [ m{r)dr yO<s<t<T. 

J s 

For an absolutely continuous curve the metric derivative defined by 

W{i-it+h,f^t) 



(4.4) 



lim 



\h\ 



exists for a.e. t E [0, T] and is the minimal m in (4.4). 

Proposition 4.6 (Metric velocity). A curve (A*t)te[o,T] ^-s absolutely contin- 
uous with respect to W if and only if there exists a Borel family (i^f)ie[o,T] 
such that (/i, v) G C£t <ind 



fT 

/ y^A{fit,i^t)dt < oo 
Jo 



In this case we have \fx[\ < A{jit,i^t) for a.e. t G [0, T]. Moreover, there 
exists a unique Borel family i>t with (/i, i/) G CEt such that 



A{iiuVt) for a.e. t€ [0,T] 



(4.5) 



Proof. The proof follows from the very same arguments as in [12, Thm. 
5.17]. □ 
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We can describe tlie optimal velocity measures Vt appearing in the pre- 
ceding proposition in more detail. We define 

T^<^{W^) := [udMiociG) : ^(^,i/)<oo, (4.6) 

A{fi, v) < A{fi, v + r])\fr] : V • 77 = o| . 

Here V • 77 = is understood in a weak sense, i.e. 

^ J Vax,y)r]idx,dy) = G C^iW") . 

Corollary 4.7. Let {fJ,,i^) € C£t such that the curve t ^ is absolutely 
continuous w.r.t. W. Then v satisfies (4.5) if and only if Vt S (S>{T\$d\ 
for a.e. t£ [0,T]. 



In the light of the formal Riemannian interpretation of the distance W 
we view T^^(M'^) as the tangent space to ^{W^) at the measure fi. If fi is 
absolutely continuous with respect to m we can give an explicit description 
of T^^(M'^) as a subspace of an L^ space. For this recall that we denote by 
Jm E Aiioc{G) the measure given by Jm{dx,dy) = J{x,dy)m{dx). 



Proposition 4.8. Let fi = pm e ^(R"^). Then we have u G r^^(]R'^) if 
and only if u = wpJm is absolutely continuous w.r.t. the measure pJm and 

w £ {V(^ I 99 G Cc^(M'^)}^'^'''^"^ =: Tp . 



Proof. If A{p, v) is finite we infer from Lemma 2.3 that ly = wpJm for some 

|2 



density w : G — )• M and that A{p,i') = \\'w\\1^2(gj^\. Now the optimality 



condition in (4.6) is equivalent to 

where Np := {v G L'^{pJm) : JV^vp dJm = G C^{R'^)}. This 
implies the assertion of the proposition after noting that Np is the orthogonal 
complement in L^ oi Tp. □ 

The convexity and monotonicity properties of the action functional es- 
tablished in Section 2 extend naturally to the distance function. 

Proposition 4.9 (Convexity of the distance). Let Pq, Pi G ^{M.'^) for j = 
0, 1. For T G [0, 1] and k = 0,1 set p]. = Tp\ + (1 — t)p^j,. Then we have : 

y^[l-^l.l^\? < rWiplp\)^ + il-T)W{p^o,l^if ■ 

Proof. We can assume that yV{pQ, p^) is finite and choose minimizing curves 
ip^,i^^) G C£i{pi,p{). Then for t G [0,1] set pi = rpj + (1 - r)/x? and 
= Tvj + (1 — t)i/°. Observe that {p'^,v'^)t G C£i{pq, p\). From the 
definition of W and the convexity of A as stated in Lemma 2.7 we infer 



w(^i5,Ml)' < 



['A{pl,ul)dt < f TA{p],jy]) + {l^T)A{plu^t)dt 
Jo Jo 

TWipl,p\f + il-T)W{p'o,l^'lf ■ 

□ 
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Proposition 4.10 (Monotonicity under convolution). Let fio, pi G 
Assume that J satisfies (2.6) and let m be Lebesgue measure. Let k be a 
convolution kernel. Then we have 

W{po*k,fii* k) < W{f-io,ni) . 

If we set k^{x) = e~'^k{x/e), then as e \ we have 

W{iJ,o*ke,p,i*ke) — > >V(/io,/Ui) • 

Proof. Assume that W(/-fo,/Ui) is finite, as otherwise there is nothing to 
proof. Let (fi,!^) G C£i{no, pi) be a minimizing curve according to Propo- 
sition 4.3. Define Jit = fit * k, i^t = i^t * k. We claim that (pt, V) G 
C£i{fio * k, fii * k). Indeed, let us show that the continuity equation (v) 
in (3.6) holds for The other properties are equally easy to verify. So 

let if G C^((0, 1) X W^) and set ^{t,x) = f ip{t,x + z)k{z)dz. Using the 
continuity equation for {p^f) and (2.7) we obtain 



dtfdfitdt = J dt(p{t,x + z)k{z)dzfit{dx)dt 
= J dt^dntdt = j V^dutdt 
= j V(p{t,x + z,y + z)k{z)vt{dx,dy)dzdt 

= ~\ j Vv^dt/tdt . 

Now the first assertion follows immediately from Proposition 2.8. This in 
turn together with weak lower semicontinuity of W (see Theorem 4.4) yields 
the second assertion. □ 

5. Geodesic convexity and gradient flow of the entropy 

In this section we focus on a translation invariant jump kernel J and 
will identify the evolution equation (1.1) as the gradient flow of the relative 
entropy in the framework of gradient flows in metric spaces developed in [1]. 
So let us assume from now on that J satisfies 

J{x-z,A) = J{x,A + z) Vx,z G M°',A G fi(M"') 

and that m is Lebesgue measure on W^. Moreover we assume that 9 is the 
logarithmic mean defined by (2.2). Under this assumptions we can write 

J(x, A) = v{A-x) Vx G M"^ , A G i3(M'^) , 

where is a Levy measure, i.e. a Borel measure on M"^ \ {0} satisfying 

{l^\y\^) v{dy) < oo . 

Now the evolution equation takes the form 

dtp = Cp , 
where the operator C is given by 



Cp{x) := I {p{x + y)- p{x)-y-Vp{x)l{\y\<i})i>{dy) 
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Note that C is also the generator of the Levy process X with vanishing drift 
and diffusion and with Levy measure v (see e.g. [4] for background on Levy 
processes). It is a pseudo differential operator whose symbol is given by the 
Levy-Khinchine formula 

m = J e'^y'^^ - 1 - z(y,Ol{|,,|<i}^(d?/) . 

This means that T{Cp) = r]J^{p), where denotes the Fourier transform. 
Recall that the law of Xt can be given explicitly in terms of its Fourier 
transformation. Namely, we have 

E[exp{i{C,Xt))] = exp(ir?(C)) . 

Throughout this section we will make the following assumption on v in terms 
of the law of the associated Levy process. 

Assumption 5.1. Assume that the law of the process Xt has a density ipt 
such that i/jt > for allt > 0. Moreover, assume that ip : (0, oo) x M*^ — t- M+ 
is such that ipt,Cipt are rapidly decreasing functions locally uniformly in t. 

Remark 5.2. This is a technical assumption made to simplified the presen- 
tation. It is used to ensure convergence of integrals in the proof of Theorem 
5.5 and could be weakened substantially. Still, Assumption 5.1 is fulfilled 
for example, when z/(d?/) = Cq, for a S (0, 2). For a suitable constant 

Ca the Levy process X is then the symmetric, isotropic a-stable process and 
the symbol is given by 77(^) = |^|". 

Recall that a smooth function / : M'^ — t- M is called rapidly decreasing 
if |x^-D°/(x)| — 7- as |x| — 7- oo for any multi-indices a,/3. We obtain a 
semigroup {Pt)t>o on ^(M*^) endowed with the distance W by setting 

Pt[lj] := H*tpt ■ 

For u e M{G) we set 

Pt[u] := u*i)t , 

with the convolution being understood in the sense of (2.7). Proposition 
4.10 shows that P is a -semigroup in the sense that Pt[fj] p weakly 
as i — >• 0. Moreover, Pt[p] = PtnT- is absolutely continuous w.r.t. Lebesgue 
measure for any p € =^(M"') and the density pt satisfies dtpt = Cpt- 

The notion of gradient flow can be defined in abstract metric spaces and 
has been studied extensively in this setting (see [1]). Of particular interest 
are gradient flows of functionals that are geodesically (semi-) convex. In 
this situation the gradient flow is characterized by the so called "Evolution 
Variational Inequality" (EVI) . We adopt the following definition. 

Definition 5.3. Let {X,d) be a metric space and F : A — t- (—00,00] a 
lower semicontinuous function. Further let {St)t>o be a -semigroup on X 
and A e M. S is called the (X-)gradient flow of F if St{X) C D{F) for all 
t > 0, the map t 1— )• F{St{u)) is non-increasing for all u & X and if for all 
u£X,v£ D{F),t> 0: 

^^d\St{u),v) + ^d\St{u),v)+F{St{n)) < F{v) . (5.1) 
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Here D{F) := {x £ X \ F{x) < 00} denotes the proper domain of the 
function F. 

We will apply this definition in the case where X = .^(M'^) and F is the 
relative entropy H defined for fi G £5^(R'^) by 



nip) :-- 



f plog p dm , if ^ = pdm and / (plog p)+dm < 00 
+00 , else. 



Let us start by stating a result giving the entropy production along the 
semigroup P. As before, we will denote by Jm S MiociG) the measure given 
by Jm{dx,dy) = J{x,dy)m{dx). For a probability measure p S J^{W^) we 
define a non-local analogue of the Fisher information by 

^(-^j) U/V/3Vlog/)d(Jm), if ^ = and p > , 
[ +00 , else . 

Proposition 5.4. Let p G ^(M*^) and set pt = ptm := Pt[p]- For every 
t > we have Tiipt) G (—00,00) and X{pt) < 00. Moreover, we have the 
energy identity 

nipt) - nips) = - ^(f^r) dr Vt > s > . (5.3) 
In particular the map 1 1— t- T-L{pt) is non-increasing. 

Proof. Finiteness of Tiipt) follows readily from the fact that tpt is rapidly 
decreasing. We prove (5.3) by approximating Ti with functional Tin- Let 
us set 

fn{u) := / max(l + log(r), — n) dr . (5.4) 
JO 

Then we have fn{u) \ nlog(n) and /,i(n) \ 1 + log(ii) as n — 00. For 
p = pm G ^^iW^) we set Tinip) := / fn{p)dm. Now we calculate 



nniPt) - y-niPs) = J fn{Pt) " fn{Ps) dm 

= If I V/;(p.)Vp, d(Jm)dr . 

The interchange of integrals and integration by parts are easily justified by 
the fact that fniPr) is bounded and Cpr is rapidly decreasing locally uni- 
formly in r. Letting finally n — t- 00 we obtain (5.3) by monotone convergence 
of both the left and right hand sides. □ 

We will now show that the semigroup (Pj) is the gradient flow of the 
relative entropy with respect to the distance W. Our strategy of proof is 
inspired by an argument developed in [11] and used in a similar form in 
[12, Thm. 5.29]. Recall that W is a pseudo distance, thus it is necessary to 
consider the sets '■= {p S ^(M"^) : W(/-f, r) < 00} for a given r G 
The following two results restatement of Theorem 1.3. 
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Theorem 5.5. Let ^ G ^{W^) and set nt := Pt[fj]. Then fit G D{n) n 
for all t > and the map t i— )• 'H{fit) is non-increasing. Moreover, for any 
a E the Evolution Variational Inequality holds: 

\^y^\i^ucj) + n{fit) < n{a) vt>o. (5.5) 

Proof. The first statement is a direct consequence of Proposition 5.4. For 
the second statement it is sufficient to assume fi, G DiT~i) and prove the 
inequality at t = 0. So let a € D{%) and let (/^s, ^'s)sg[o,i] be a minimizing 
curve /io '■= <7 to /^i := fx. We set 

/^M = Pm™- := ^st+£[^s] and 
= vl^tJm := Pst+eWs] ■ 

The couple (fJ-lt^'^st) does not satisfy the continuity equation. Hence we 
make the correction 

^It = vltJm := {vlt - tVplt)Jm . 
We will need the following result whose proof we postpone for the moment. 
Claim 5.6. We have G C£i{(jf,, fi^j^t) and moreover, 

■H{fie+t) - n{fle) = -\f^j Vlogp^Xt ds . (5.6) 

From the definition of the distance W we now obtain the estimate 

W{fit+e,<yef < [' A{fiit,iyit)ds . (5.7) 
Jo 

Recall the notation p{x, y) = 6{p{x), p{y)) with 6 being the logarithmic mean 
here. We can further estimate 

= /^d(Jm) 

= [ {\vl^\'-2tVpltvlt-t'\Vpl/)-^d{Jm) 

<^«„PL)-«/viog«,,<.d(j™) 

< A{ps,i^s)-t j Vlogp^ i<i d(Jm) , 

where we have dropped the quadratic term in t and used the monotonicity 
under convolution (Proposition 2.8) in the last inequality. Integration over 
s from to 1 and using (5.6) gives 

]^w{pt+e,'ye? < \w{p,af-t-{n{pt+,)-n{<je)). 

By lower semicontinuity of W (see Theorem 4.4) and continuity of % along 
the semigroup we can take the limit e — )• and obtain 

\y^{i^ucTf < \w{p,af-t-{n{pt)-n{<j)). 

Finally, rearranging terms and letting t \ yields (5.5). 
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Proof of Claim 5.6. For the proof we first need two estimates. First note 
that 



JO 



(5.8) 



Indeed, by convexity of the map {u,v) i— )• (u — f)(logM — logf) we have 
that * il^t) < T{il)tfn) for every ji G ^(M'^). Hence we conclude from 
Proposition 5.4 that 



JO JO 



I{^e+stm) ds = Ti{'ip£m) - Tiiil^e+tm) < oo . 



From this we conclude that the curve {^j!: ti^'^.t) finite action. Indeed, 

17 If 

|2 |V7 e |2 



< 



2 / A{^Xs,i^s)ds + 2t^ I ds < oo , 

Jo Jo 



where we use Proposition 2.8 in the last inequality. Using Lemma 2.6 and 
the previous estimate we see that v':^ satisfies the integrability condition 
(iv) in Definition 3.2. The other conditions are also easily checked. Hence 
we see {n':^,^':^.) e CSiia^, ^^e+t)■ 

Now let us prove (5.6). By a simple convolution argument we can assume 
that ( is differentiable in s. Let /„ be the function defined by (5.4) and 
set f{u) = ulog{u) for n > 0. Now we calculate 

-Hnifle+t) - 'Hnif^e) = J fn{pl,t)dsplt dm . 

Note that the map x i— )• fn{Ps,ti^)) is bounded and Lipschitz uniformly in 
s G [0,1]. Using the integrability condition (iv) from Definition 3.2 we can 
approximate it by functions in C^((0, 1) x M'^) and obtain by the continuity 
equation 

-Hnif-ie+t) - UniPe) = -\ ^ j V/;(p^,J<id( Jm)ds . (5.9) 

By monotone convergence the left hand side of (5.9) converges to the left 
hand side of (5.6). It remains to prove convergence of the right hand side. 
Using Holder inequality we estimate 



'/ V{f'{pl,)-f'^{pl^))dul,ds 



< 



V(/'(plt)-/;(p^,t))||<t|d(Jm)d5 



1 



< A-2{ / V(/'«,)-/;(pL)) 2/5^,d(Jm)d. 
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The integrand in the last term is bounded as 

\^inPlt)-fn{plt))\'plt < \^f'{plt)\'plt = VlogpltVplt. 

With the help of (5.8) and dominated convergence we conclude convergence 
of the right hand side of (5.9) to the right hand side of (5.6). □ 

□ 

Corollary 5.7. The entropy is convex along W -geodesies. More precisely, 
let po,pi G ^{R.'^) such that VV(/-fo,/Ui) < oo and let {pt)t£[o,i] o, geodesic 
connecting po and pi. Then we have 

Proof. This is a direct consequence of Theorem 5.5 and the fact, proved in 
[11, Thm. 3.2], that in a general setting the Evolution Variational Inequality 
implies geodesic convexity. □ 

We finish by giving an equivalent and more intuitive definition of the 
distance W in the present setting of a translation invariant jump kernel 
J. We show that it coincides with W defined in (1.5). We introduce the 
following shorthand notation. Given functions p : M*^ — )• M^. and ^ : M*^ — )• M 
we write 



1 



-^'(P)V') ■= ^ J i^(y) -4'{x))^p{^,y)J{x,dy)m{dx) . 

For two probability densities po,pi w.r.t. m and T > let us denote by 
CS'-j<{po, pi) the collection of pairs (p, "0) satisfying the following conditions: 

' (i) p : [0, r] X M"' ^ M+ is measurable ; 

(ii) pt is a probability density for all t E [0, T] ; 

{Hi) The curve t ^ pt ■= Pt^n is weakly continuous ; (5.10) 

{iv) -0 : [0, r] X M"' — > M is measurable ; 

, {v) dtpt + V • {pt^ipt) = , po = po , PT = Pi ■ 

Here the continuity equation (v) is understood in the sense that for every 
test function 99 G C^((0,T) x M°') we have 



1 



dtifptdmdt + - / / V(f{x,y)V'ilJt{x,y)pt{x,y)J{x,dy)m{dy)dt = 
J Jo J 

Proposition 5.8. Assume that m is Lebesgue measure and that J{x, dy) = 
j{y—x)dy for a function j : W^\{0} — )■ M"^ that is strictly positive. Moreover, 
assume that J satisfies 5.1. Let pi = pim E ^iW^) for i = 0, 1 such that 
T{pi) is finite. Then we have 

W(/io,/Ui)' =mf I / A'{pt,A)dt : {p,i;)eC£[{po,pi] 



(. Jo 

Note that the assumptions above on the jump kernel J are satisfied by 
the kernel Jq, associated to the fractional Laplacian. 

Proof. The inequality '<' follows easily by noting that the infimum in the 
definition of W is taken over a larger set. Indeed, given a pair (p, ?/^) G 
CS'i{po, pi) such that A'{pt, ipt)dt is finite we set pt = and define Ut £ 
Mioc{G) by ut{dx,dy) = V'4)t{x,y)pt{x,y)J{x,dy)m{dx). Then obviously 
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we have A'{pt, ipt) = A.{fit, I't) and it is easily checked using Lemma 2.6 that 
G C£i{j:LQ,Jii). 

Let us now prove the opposite inequahty '>'. To this end, note that by 
a reparametrization argument similar to Lemma 4.2 the square root of the 
infimum on the right hand side coincides with 

infj^ ,Jj^{pu^Mt : (p,V) GC£:^r(po,Pi)| • 

We set := Pt[lJ'i] = pY''^ "^Y — ^^S pY i = 0, 1 and t G (0, e]. It 
is easily checked, that the pair (p*'^, ■0*'^) belongs to C£'^{pi^ pY). Using the 
monotonicity of X under convolution as in the proof of Claim 5.6 we infer 
that 

Jo Jo 

Now let (^,1^) G C£i{jlo, P-i) be a geodesic and set /if := Ps[fJ't] = Pti^- 
Proposition 4.6 and the proof of Proposition 4.10 show that the curve 1 1— )• /if 
is absolutely continuous w.r.t. W and thus there is a family of optimal 
velocity measures i^^. By Proposition 4.8 we have that i/f = wfpfJm where 
belongs to T^. Note that pf > by Assumption 5.1 and thus pf > for all 
t G (0, 1) and moreover j > 0. Hence it is easily checked any limit of discrete 
gradients in w.r.t. the measure pfJ'm{dx,dy) = Pt{x,y)j{y — x)dxdy 
coincides again a.e. with a discrete gradient. Thus we have wf = Vipf a.e. 
for a suitable function ijj'^ : (0, 1) x M*^ — )• M. Now observe that V'^) G 
CS'APo^pD and 

< [ ^/A{^H,i^t)dt = W(/Xo,/il) , 

Jo 

where we have used Proposition 2.8 in the second line. Finally we concate- 
nate the three curves {p^'^ jtp^''^), {p"^ jip"^) and ^^'^) to obtain a curve 
G C£'i_^2eiP0i Pi) which satisfies 

yjA'{p!,rt)dt = L^'^ + L' + L'^^ 

Jo 

< W(/io,/Ui) + e(X(Ao)+X(/2i)) . 
Letting e go to zero now yields the claim. □ 
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